Problem Set 1 - Exploring 1 Variable

Exploring Diamond Price Distribution

## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:gridExtra':
## 
##     combine
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Classes 'tbl_df', 'tbl' and 'data.frame':    53940 obs. of  10 variables:
##  $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num  55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int  326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
## 
##     D     E     F     G     H     I     J 
##  6775  9797  9542 11292  8304  5422  2808
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 5222 rows containing non-finite values (stat_bin).
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 33930 rows containing non-finite values (stat_bin).
## Warning: Removed 1 rows containing missing values (geom_bar).

## [1] 1729
## [1] 0
## [1] 1656
## Saving 7 x 5 in image
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 33930 rows containing non-finite values (stat_bin).

## Warning: Removed 1 rows containing missing values (geom_bar).

Exploring Diamond Price Distribution by Cut

## diamonds$cut: Fair
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     337    2050    3282    4359    5206   18574 
## -------------------------------------------------------- 
## diamonds$cut: Good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     327    1145    3050    3929    5028   18788 
## -------------------------------------------------------- 
## diamonds$cut: Very Good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     336     912    2648    3982    5373   18818 
## -------------------------------------------------------- 
## diamonds$cut: Premium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     326    1046    3185    4584    6296   18823 
## -------------------------------------------------------- 
## diamonds$cut: Ideal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     326     878    1810    3458    4678   18806
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Saving 7 x 5 in image
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Exploring Diamond Price/Carat Distribution by Cut

## diamonds$cut: Fair
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.063   7.917   8.146   8.156   8.415   9.297 
## -------------------------------------------------------- 
## diamonds$cut: Good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.985   7.781   8.192   8.150   8.474   9.676 
## -------------------------------------------------------- 
## diamonds$cut: Very Good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.038   7.754   8.190   8.177   8.520   9.789 
## -------------------------------------------------------- 
## diamonds$cut: Premium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.958   7.860   8.233   8.238   8.580   9.746 
## -------------------------------------------------------- 
## diamonds$cut: Ideal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.011   7.806   8.104   8.157   8.469   9.746
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Exploring Diamond Price and Price/Carat Distribution by Color

## diamonds$color: D
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.028   7.806   8.135   8.169   8.466   9.789 
## -------------------------------------------------------- 
## diamonds$color: E
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.983   7.796   8.088   8.139   8.414   9.589 
## -------------------------------------------------------- 
## diamonds$color: F
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.063   7.858   8.159   8.211   8.507   9.537 
## -------------------------------------------------------- 
## diamonds$color: G
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.038   7.839   8.158   8.210   8.613   9.430 
## -------------------------------------------------------- 
## diamonds$color: H
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.958   7.782   8.248   8.187   8.542   9.229 
## -------------------------------------------------------- 
## diamonds$color: I
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.049   7.760   8.237   8.175   8.556   9.148 
## -------------------------------------------------------- 
## diamonds$color: J
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.985   7.849   8.237   8.147   8.503   9.065
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## 
##     D     E     F     G     H     I     J 
##  6775  9797  9542 11292  8304  5422  2808
## diamonds$color: D
## [1] 0.6577948
## -------------------------------------------------------- 
## diamonds$color: E
## [1] 0.6578667
## -------------------------------------------------------- 
## diamonds$color: F
## [1] 0.7365385
## -------------------------------------------------------- 
## diamonds$color: G
## [1] 0.7711902
## -------------------------------------------------------- 
## diamonds$color: H
## [1] 0.9117991
## -------------------------------------------------------- 
## diamonds$color: I
## [1] 1.026927
## -------------------------------------------------------- 
## diamonds$color: J
## [1] 1.162137
##          
##              D    E    F    G    H    I    J
##   (0,0.5] 2898 4276 3418 4147 2415 1316  462
##   (0.5,1] 2556 3629 3523 3426 2252 1426  694
##   (1,1.5] 1064 1476 2040 2800 2409 1436  835
##   (1.5,2]  213  338  440  679  764  685  434
##   (2,2.5]   40   73  117  228  429  526  350
##   (2.5,3]    3    4    3   11   29   20   24
##   (3,6]      1    1    1    1    6   13    9
## [1] 96.00667
## [1] 6775
## 
## (0,0.5] (0.5,1] (1,1.5] (1.5,2] (2,2.5] (2.5,3]   (3,6] 
##    2898    2556    1064     213      40       3       1
## 
##   D   E   F   G   H   I   J 
##  44  78 121 240 464 559 383
## Saving 7 x 5 in image

Interquartile range

##      0%     25%     50%     75%    100% 
##   357.0   911.0  1838.0  4213.5 18693.0
##      0%     25%     50%     75%    100% 
##   335.0  1860.5  4234.0  7695.0 18710.0
## [1] 3302.5
## [1] 5834.5

Price per carat by color boxplot

## Saving 7 x 5 in image

Weight distribution

Problem Set 2 - Exploring 2 Variables

Price and Volume

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## `geom_smooth()` using method = 'gam'

## [1] 0.8844352
## [1] 0.8654209
## [1] 0.8612494

##   0%  25%  50%  75% 100% 
## 43.0 61.0 61.8 62.5 79.0
## [1] -0.0106474

## [1] 0.9235455
## `geom_smooth()` using method = 'gam'

## # A tibble: 273 x 6
##    carat mean_price median_price min_price max_price `n()`
##    <dbl>      <dbl>        <dbl>     <dbl>     <dbl> <int>
##  1  0.20   365.1667          367       345       367    12
##  2  0.21   380.2222          386       326       394     9
##  3  0.22   391.4000          404       337       470     5
##  4  0.23   486.1433          498       326       688   293
##  5  0.24   505.1850          491       336       963   254
##  6  0.25   550.9245          548       357      1186   212
##  7  0.26   550.8972          554       337       814   253
##  8  0.27   574.7597          575       361       893   233
##  9  0.28   580.1212          586       360       828   198
## 10  0.29   601.1923          607       334      1776   130
## # ... with 263 more rows
## Saving 7 x 5 in image

Problem Set 3 - Exploring 3 Variables

Price by Color and Cut

Price by Table and Cut

## diamonds$cut: Fair
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   49.00   56.00   58.00   59.05   61.00   95.00 
## -------------------------------------------------------- 
## diamonds$cut: Good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   51.00   56.00   58.00   58.69   61.00   66.00 
## -------------------------------------------------------- 
## diamonds$cut: Very Good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   44.00   56.00   58.00   57.96   59.00   66.00 
## -------------------------------------------------------- 
## diamonds$cut: Premium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   51.00   58.00   59.00   58.75   60.00   62.00 
## -------------------------------------------------------- 
## diamonds$cut: Ideal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   43.00   55.00   56.00   55.95   57.00   63.00

Price by Volume and Clarity

Price by Volume and Clarity

Create a scatter plot of the price/carat ratio of diamonds. The variable x should be assigned to cut. The points should be colored by diamond color, and the plot should be faceted by clarity.